On Flat versus Hierarchical Classification in Large-Scale Taxonomies
نویسندگان
چکیده
We study in this paper flat and hierarchical classification strategies in the context of large-scale taxonomies. To this end, we first propose a multiclass, hierarchical data dependent bound on the generalization error of classifiers deployed in large-scale taxonomies. This bound provides an explanation to several empirical results reported in the literature, related to the performance of flat and hierarchical classifiers. We then introduce another type of bound targeting the approximation error of a family of classifiers, and derive from it features used in a meta-classifier to decide which nodes to prune (or flatten) in a large-scale taxonomy. We finally illustrate the theoretical developments through several experiments conducted on two widely used taxonomies.
منابع مشابه
Large-Scale Many-Class Prediction via Flat Techniques
Prediction problems with huge numbers of classes are becoming more common. While class taxonomies are available in certain cases, we have observed that simple flat learning and classification, via index learning and related techniques, offers significant efficiency and accuracy advantages. In the PASCAL challenge on large-scale hierarchical text classification, the accuracies we obtained ranked...
متن کاملLearning Taxonomy Adaptation in Large-scale Classification
In this paper, we study flat and hierarchical classification strategies in the context of largescale taxonomies. Addressing the problem from a learning-theoretic point of view, we first propose a multi-class, hierarchical data dependent bound on the generalization error of classifiers deployed in large-scale taxonomies. This bound provides an explanation to several empirical results reported in...
متن کاملAn Empirical Comparison of Flat and Hierarchical Performance Measures for Multi-Label Classification with Hierarchy Extraction
Multi-label Classification (MC) often deals with hierarchically organized class taxonomies. In contrast to Hierarchical Multi-label Classification (HMC), where the class hierarchy is assumed to be known a priori, we are interested in the opposite case where it is unknown and should be extracted from multi-label data automatically. In this case the predictive performance of a classifier can be a...
متن کاملTraining a hierarchical classifier using inter document relationships
Concept hierarchies, also called taxonomies or directories, are widely used on the World Wide Web to organize and present large collections of Web pages. They were originally developed to help users locate relevant information by browsing. More recently, conceptual search engines such as KeyConcept have been developed that retrieve documents based upon the concepts they discuss in addition to t...
متن کاملUsing Boolean Rule Extraction for Taxonomic Text Categorization for Big Data
Categorization hierarchies are ubiquitous in big data. Examples include MEDLINE’s Medical Subject Headings (MeSH) taxonomy, United Nations Standard Products and Services Code (UNSPSC) product codes, and the Medical Dictionary for Regulatory Activities (MedDRA) hierarchy for adverse reaction coding. A key issue is that in most taxonomies the probability of any particular example being in a categ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013